Back

Molecular & Cellular Proteomics

Elsevier BV

Preprints posted in the last 30 days, ranked by how well they match Molecular & Cellular Proteomics's content profile, based on 158 papers previously published here. The average preprint has a 0.11% match score for this journal, so anything above that is already an above-average fit.

1
Stoichiometry-dependent specificity in biotin enrichment: a benchmarking framework for proximity labeling proteomics

Zala, C. A.; Trueba Sanchez, M. C.; van den Bor, J.; Willemsens, T.; Verweij, F. J.; Altelaar, M.; Stecker, K.

2026-05-11 molecular biology 10.64898/2026.05.07.723439 medRxiv
Top 0.1%
23.4%
Show abstract

Proximity labeling methods (including, BioID, TurboID, ultraID), along with surface proteomics and microdomain mapping, enable proteome-wide identification of spatially proximal proteins via MS-based analysis. These workflows require specific enrichment of biotinylated proteins using affinity purification, yet enrichment specificity can often be compromised by non-specifically bound proteins. As labeling strategies are increasingly applied to complex biological samples with low protein input or low biotin stoichiometry, accurately distinguishing true targets from background becomes a major analytical challenge. Despite its critical impact on data quality and interpretation, the influence of biotinylation level and protein input on enrichment performance remains poorly characterized, limiting the reliability of proximity labeling experiments. To address this, we establish a quantitative benchmarking framework that systematically evaluates biotin enrichment under controlled conditions, including scenarios of low biotin stoichiometry. Using this setup, we show that enrichment specificity strongly depends on biotin stoichiometry: higher levels of biotinylation in samples yield high specificity, whereas low biotinylation increases non-specific background. Reduced protein input further limits recovery of true targets, yet maintains enrichment specificity, highlighting sensitivity constraints of enrichment-based workflows. We apply this framework to biotinylated extracellular vesicle (EV) cargo uptake in recipient cells using ultraID-CD63 labeling. Detection of the most abundant EV cargo proteins under low biotinylation conditions indicates that current workflows approach the lower bounds of biotin enrichment sensitivity. Together, these standards provide a practical reference for evaluating and optimizing biotin enrichment workflows, supporting quantitative and reproducible proximity labeling in proteomics.

2
From Peaks to Power: Systematic Evaluation of Chromatographic Sampling Reveals Determinants of Quantification and Biological Discovery in DIA Proteomics

Cantrell, L. S.; Just, S.; Stukalov, A.; Farokhzad, O. C.; Batzoglou, S.

2026-05-16 bioinformatics 10.64898/2026.05.13.724964 medRxiv
Top 0.1%
23.1%
Show abstract

Modern DIA proteomics increasingly emphasizes throughput and depth for large-cohort studies, but methods are often optimized using proxy metrics that can mask losses in quantifiable signal and statistical power. Here, we evaluate how datapoints per peak and other chromatographic features jointly contribute to quantification and downstream biological discovery. Using a matrix-matched calibration curve dataset, we checked how the number of datapoints per peak (DPPP) affects the limits of detection and quantification (LOD/LOQ). Reduced DPPP minimally affected LOD but substantially degraded LOQ. Feature modeling and nonparametric association analyses identified precursor peak area as the strongest feature-level predictor of LOQ, whereas DPPP showed weaker and context-dependent effects. Simulations of chromatographic peak integration recapitulated these trends, showing that increased sampling primarily improves integration precision, while quantitative accuracy is strongly governed by peak height and peak shape. Finally, when comparing 20 cancer vs 20 control plasma samples processed with Seer Proteograph, the decrease in DPPP led to a loss of statistical significance for proteins with low-abundance precursors. These findings argue that DIA optimization should prioritize LOQ and statistical power metrics - not identifications alone - by balancing sampling density with chromatographic peak height and quality to maximize useful biological signal.

3
Trypsin exhibits exopeptidase-like activity toward N-terminal arginine that biases proteomic analyses

Ambrose, E. A.; Kandasamy, G.; Meulener, M. M.; Zhang, F.

2026-05-16 biochemistry 10.64898/2026.05.15.725550 medRxiv
Top 0.1%
23.1%
Show abstract

Many proteomics protocols rely on enzymatic digestion of complex protein mixtures to generate peptides with predictable cleavage patterns for the mass spectrometry analysis. One of the most utilized enzymes, trypsin, is classically defined as a serine endopeptidase with high specificity for cleaving peptide bonds on the C-terminal side of internal lysine and arginine residues. Accordingly, trypsin is not expected to remove the N-terminal arginine, which may arise through posttranslational modification such as arginylation or by proteolysis exposing internal residues as the new N-termini. N-terminal arginine plays important biological roles, including functioning as an N-degron and modulating protein interactions/signaling through its positive charge. Curiously, prior mass spectrometry-based studies utilizing trypsin to identify proteins bearing N-terminal arginine have frequently reported low and inconsistent yields, suggesting potential systematic bias in current proteomic approaches. Here, we explored whether trypsin would affect the integrity of the N-terminal arginine. By using antibodies specifically recognizing N-terminal arginine of different peptides, and by using mass spectrometry peptide analysis, we show that trypsin can remove N-terminal arginine residues in an exopeptidase-like manner. This effect occurs across a range of digestion conditions consistent with standard proteomic workflows, on peptides or whole proteins, and depends on trypsin concentration, incubation time, and catalytic activity. In addition, we show that the alternative arginine-cleavage enzyme Arg-C can also affect N-terminal arginine in a sequence-dependent context. In contrast, Lys-C and LysargiNase do not exhibit such effects, providing suitable alternative digestion strategies. Together, these findings reveal an unappreciated enzymatic behavior of arginine-cleaving proteases and suggest that their widespread use may systematically compromise the detection of N-terminal arginine in proteomic studies.

4
LAMPrEY: a Python-based automated quality control tool for large-scale proteomics datasets

Valdes-Tresanco, M. E.; Wacker, S.; Valdes-Tresanco, M. S.; Plakhotnyk, A.; Brodie, N. I.; Hepburn, M.; Ulke-Lemee, A.; Huttlin, E. L.; Lewis, I. A.

2026-05-11 bioinformatics 10.64898/2026.05.06.722826 medRxiv
Top 0.1%
22.8%
Show abstract

Over the past years, proteomics has moved increasingly towards the analysis of large cohorts of biological specimens. This has been made possible by significant improvements in mass spectrometry technology, chromatographic separation methods, and improved data acquisition strategies. These technological advances now routinely enable experiments that yield vast datasets that substantially outstrip the capacity of existing proteomics data analysis approaches. Processing such large datasets requires purpose-built, quality control tools designed to organize and analyze the data while recording all processing parameters for reproducibility. To address this need, we developed an open-source, Python-based software platform, Large-scale Automated Multi-level Proteomics Evaluation by Python (LAMPrEY), a comprehensive quality-control pipeline for quantitative proteomics analyses of large cohorts of samples. LAMPrEY features GUI-based file submission, automated processing with MaxQuant and RawTools, an interactive analytics dashboard, and an application programming interface (API) for programmatic usage that collectively enable rapid, reproducible analysis and interpretation of proteomics data. We demonstrate the longitudinal monitoring and analytical capabilities of LAMPrEY using TMT11 quantitative proteomics data generated from 910 Enterococcus faecium isolates collected from bloodstream infection patients. LAMPrEY is an open-source software that can be accessed at www.lewisresearchgroup.org/software.

5
Analysis of Confounding Factors in Reactive Cysteine Profiling Reveals Enhanced Chromatin-Protein Association via CDK7 Inhibition by THZ1

Yang, K.; Li, S.; Li, B.; Richards, D.; Dong, K.; Seneviratne, U.; Lee, W.; Iannetta, A.; Xu, H.; Gygi, S.; Yu, Q.

2026-05-07 cell biology 10.64898/2026.05.05.721470 medRxiv
Top 0.1%
22.8%
Show abstract

Recent advances in activity-based proteome profiling (ABPP) have enabled global mapping of cysteine ligandability, uncovering novel biological insights and opportunities for identifying disease vulnerabilities. While both live cell-based and native lysate-based ABPP have been applied, how cysteine ligandability differs between these systems and what factors influence these measurements remain unclear. Building on our previous development of a high-throughput TMT-ABPP workflow for native lysates, here we adapt the protocol for live cells and systematically compare cysteine ligandability across both platforms. Our analysis reveals three major contributors to the discrepancies: in-cellulo cysteine accessibility, protein abundance changes, and protein relocalization. Notably, we highlight that CDK7 inhibitor THZ1 induces substantial protein relocalization and promotes chromatin binding. Together, these results provide a practical framework for ABPP experimental design and data interpretation, supporting more accurate application of ABPP in functional proteomics and drug discovery.

6
Reference-Based Library Construction Improves Performance in low-input diaPASEF Workflows

Charkow, J.; Ghaznavi, M.; Seale, B.; Peng, J.; Gingras, A.-C.; Rost, H.

2026-05-04 bioinformatics 10.64898/2026.04.29.721088 medRxiv
Top 0.1%
22.6%
Show abstract

In low input mass spectrometry-based proteomics, Data Independent Acquisition (DIA), including diaPASEF, is quickly becoming the method of choice for label free quantification. Whether using empirical or in silico spectral libraries, performance is dependent on the library; however, the optimal library construction strategy for low input proteomics remains an open question. To address this, we examine and develop library construction approaches that are compatible with both spectrum-centric and peptide-centric analysis workflows. These approaches leverage a closely related, high-quality sample to improve library quality. First, we validated our approach in bulk sample amounts where we observed that the effects of gas-phase fractionation based library construction is dependent on the software framework, with improvements more pronounced in OpenSWATH compared to DIA-NN. In OpenSWATH, our peptide-centric library reconstruction workflow consistently outperforms a transfer learning strategy, an emerging alternative approach. In DIA-NN, trends are dependent on library source highlighting OpenSWATHs stronger dependence on the search space. In low-input applications, such as single-cell-equivalent injection amounts (100 pg) of HeLa cell digest on a timsTOF SCP, our library construction approach provided more pronounced improvements across both software tools compared to bulk samples. Using a peptide-centric reconstruction approach with the OpenSWATH analysis framework, we detected over 15,000 peptide precursors (2480 protein groups), a 90% improvement over the original library. Furthermore, using a spectrum-centric construction approach, peptide precursor identification rates improved over 6-fold ([~]1000 to [~]6000). Our strategy provides a practical solution for generating high-quality libraries in low-input applications.

7
Mapping the interactome of human tRNA methyltransferase TRMT1 using dual proximity labeling

D'Oliviera, A.; Olson, S.; Bernhard, H.; Yu, Y.; Mugridge, J. S.

2026-05-19 biochemistry 10.64898/2026.05.18.725941 medRxiv
Top 0.1%
19.0%
Show abstract

Transfer RNA methyltransferase 1 (TRMT1) installs N2-methylguanosine and N2,N2-dimethylguanosine modifications at position 26 of mammalian tRNAs, supporting tRNA structure, translation, and cellular response to redox stress. However, the local environment and interactome of TRMT1 in the cell is poorly defined. Here, we use APEX2-based proximity labeling of the N- and C-terminus of TRMT1, coupled with label-free quantitative proteomics to map candidate TRMT1-proximal proteins in HEK293T cells. Mass spectrometry data was acquired using both data-independent acquisition (DIA) and data-dependent acquisition (DDA) methods, and it was found that DIA substantially increased proximity proteome coverage, reproducibility, and the number of significantly enriched candidate hits compared to the DDA method. N- and C-terminal APEX2-TRMT1 constructs captured largely overlapping proteomes, suggesting the dual-labeling strategy provides a robust map of proximal proteins. Analysis of the significant TRMT1-proximal proteins reveals enrichment in RNA processing and ribonucleoprotein-associated factors, in addition to hits connected to tRNA modification, tRNA biogenesis, and redox-associated biology. These data provide a proteome-scale view of TRMT1-associated cellular proteins and environments, and lay the groundwork for future validation of functional TRMT1 interaction networks. SignificanceO_LIFusing APEX2 enzyme to both N-terminal and C-terminal of the bait enhanced the sensitivity for identification of protein interactions. C_LIO_LICombining APEX2-based endogenous labeling with DIA mass spectrometry increases reproducibility and depth of proximity proteome. C_LIO_LIThe study provides a rich source of potential interacting or proximally close proteins to TRMT1, which warrants further validation studies. C_LI

8
Systematic characterization of the yeast secretome under diverse proteosynthetic stress conditions reveals secretion of functional ER chaperone BiP

Liu, S.; Schulz, B. L.

2026-05-22 biochemistry 10.64898/2026.05.21.727034 medRxiv
Top 0.1%
18.9%
Show abstract

The yeast secreted proteome plays critical biological roles and influences product and production parameters in industrial fermentation. Systematic profiling of the response of the yeast secretome to intrinsic and extrinsic factors is therefore essential for understanding these functions and for optimizing manufacturing processes. Here, we characterized the yeast secretome under diverse proteosynthetic stress conditions, including glycosylation deficiency, oxidative, reductive, and thermal stresses. The secretome was predominantly composed of conventionally secreted proteins, while a subset of proteins appeared to be secreted via unconventional pathways. Distinct secretome profiles were observed in response to different stressors, driven by a combination of altered intracellular proteomes, altered canonical secretion, and altered cell lysis and unconventional protein secretion, while reflecting the underlying metabolic state of the cells. Heat stress did not impact protein glycosylation but did cause similar protein misfolding stress to N-glycosylation deficiency. Intriguingly, canonically intracellular chaperone BiP was abundant in the secretome in particular stress conditions where its activity would be beneficial. BiP interacted with probable extracellular client proteins in vitro, consistent with it acting as a functional extracellular chaperone/holdase in conditions such as reductive stress in which client proteins could be misfolded outside the cell.

9
Top-down Sequencing of Intact Proteoforms using the timsOmni mass spectrometer: Accurate Determination of Co-occurring Histone Modifications

Berthias, F.; Bilgin, N.; Smyrnakis, A.; Le Boiteux, E.; Kosmopoulou, M.; Albers, C.; Suckau, D.; Mecinovic, J.; Papanastasiou, D.; Jensen, O. N.

2026-05-05 biochemistry 10.64898/2026.05.01.722147 medRxiv
Top 0.1%
18.6%
Show abstract

Deep characterization of intact proteoforms remains an analytical challenge in functional proteomics, particularly for heterogenous multi-site post-translational modifications at distinct amino acid residues. Histones are among the most dynamically and diversely post-translationally modified proteins in eukaryote cells, carrying multiple, co-occurring and reversible modifications that can give rise to isomeric proteoform species. Tandem mass spectrometry with multimodal fragmentation capabilities is a promising approach for deep characterization of intact proteoforms, such as modified histones. We applied the novel timsOmni mass spectrometer, which incorporates the Omnitrap platform enabling multimodal MS workflows, for residue-level mapping of histone modifications, including acetylation and methylation. Recombinant histones H3.1 and H4 were in vitro acetylated by enzymes GCN5, PCAF and p300 to generate mono- and multi-acetylated proteoforms. Complementary MS2 electron- and collision-based dissociation (ECD, EID, RCID and ECciD), together with MS3 strategies, produced complete or near-complete backbone fragmentation of intact protein ions (>92% amino acid sequence coverage). For monoacetylated species generated by the more site-selective lysine acetyltransferases, the dominant proteoform matched the known catalytic preferences of the enzymes (H3.1K14ac for GCN5 and PCAF, and H4K8ac for PCAF), while minor positional isomers were also identified and their relative abundance estimated. In contrast, the broader substrate specificity of p300 produced a wide distribution of H4 proteoforms bearing up to seven acetylated lysine residues. Species carrying six and seven acetylations were characterized by multimodal MS2/MS3 experiments, enabling localization of individual acetylation sites and discrimination of positional isomers. Finally, endogenous histone proteoforms from liver extracts were analyzed, yielding sequence coverages of 92-93% for the most abundant species and enabling confident localization of multiple PTMs (acetylation and methylation). These results illustrate that multimodal MSn fragmentation of intact proteins supports residue-level assignment of combinatorial histone marks and coexisting positional isomers. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=165 HEIGHT=200 SRC="FIGDIR/small/722147v1_ufig1.gif" ALT="Figure 1"> View larger version (34K): org.highwire.dtl.DTLVardef@387ab5org.highwire.dtl.DTLVardef@2410org.highwire.dtl.DTLVardef@13fc392org.highwire.dtl.DTLVardef@140e054_HPS_FORMAT_FIGEXP M_FIG C_FIG HighlightsO_LIMultimodal MS{superscript 2}/MS3 maps histone PTMs on intact proteins. C_LIO_LIECD, EID, RCID, and ECciD provide complete or near-complete sequence coverage. C_LIO_LIMS3 localizes acetylation sites, distinguishes positional isomers. C_LIO_LIEndogenous H4 proteoforms are assigned with site-specific PTM mapping. C_LI

10
Capillary-based Subcellular Sampling Uncovers the Stress Granule Proteome in Single Cells

Davison, C.; Locker, N.; Marques, M.; Kelly, S.; Relton, E.; Sharma, T.; Fraser, E.; Aragon Fernandez, P.; Schoof, E. M.; Petersen, M.; Pascoe, J.; Lilley, K. S.; Pinto, S. M.; Spick, M.; Bailey, M.

2026-05-13 cell biology 10.64898/2026.05.11.724230 medRxiv
Top 0.1%
18.5%
Show abstract

Many diseases arise from dysfunction within specific organelles or biomolecular condensates, highlighting the value of analysing proteins at subcellular resolution to uncover new biological mechanisms. We report a novel capillary-based subcellular sampling workflow coupled with liquid chromatography-mass spectrometry (LC-MS) for proteomic analysis of defined subcellular regions of individual cells. We applied this methodology to stress granules (SGs), membrane-less biomolecular condensates that form in response to cellular stress (including viral infection), and are implicated in infection, neuropathology and cancer. Comprehensive characterisation of SG protein composition remains limited by technical challenges associated with bulk purification, including loss of spatial context, dynamic behaviour and contamination from cytosolic material. Using our novel method, we identified a high-confidence set of 405 SG-associated proteins, including 46 established SG residents alongside numerous previously unreported candidates. Functional enrichment analysis revealed pathways consistent with known SG biology, while comparison with an independent cytosolic proteome dataset demonstrated minimal overlap, supporting the specificity of the sampling strategy. Selected novel SG protein candidates (AHNAK2, DDX39B, NUDT1 and FKBP2) were validated using immunofluorescence microscopy. These findings establish capillary-based subcellular sampling as a viable approach for proteomic analysis of SGs with preserved spatial context and provide a framework for analysing other subcellular compartments. Table of contentsWe report an LC-MS-based capillary sampling workflow for proteomic analysis of subcellular structures within single cells. This methodology identified 405 high-confidence stress granule-associated proteins, including 46 previously established and numerous novel candidates. The approach demonstrated high specificity and preserved spatial context, expanding the capabilities of subcellular proteomics. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=55 SRC="FIGDIR/small/724230v1_ufig1.gif" ALT="Figure 1"> View larger version (21K): org.highwire.dtl.DTLVardef@1fa0bb0org.highwire.dtl.DTLVardef@1158524org.highwire.dtl.DTLVardef@1d82812org.highwire.dtl.DTLVardef@2ee4d9_HPS_FORMAT_FIGEXP M_FIG C_FIG Figure made in Biorender.com.

11
Simultaneous single-cell profiling of the transcriptome and proteome

Xu, X.; Caggiano, M. P.; Wells, M. L.; Sun, G.; Lim, S. M.; Multari, D. H.; Blundell, S. A.; Hartel, N.; Viner, R.; Polo, J. M.; Schittenhelm, R.; de Marco, A.

2026-05-15 systems biology 10.64898/2026.05.14.724921 medRxiv
Top 0.2%
12.9%
Show abstract

Transcriptomic and proteomic measurements from the same single cell provide complementary information that cannot be inferred from either modality alone, yet methods for the parallel recovery of both analyte classes from a single-cell lysate remain limited. Here, we describe a workflow in which individual cells are isolated by automated dispensing into a minimal, MS-compatible lysis volume, followed by sequential mRNA capture and protein supernatant recovery, prior to independent downstream processing. The method is compatible with standard library preparation and data-independent acquisition proteomics pipelines and requires no dedicated instrumentation beyond a single-cell dispensing platform. We evaluated workflow performance on 67 single cells across 3 iBlastoids. Transcriptomic sequencing detected a median of 5375 genes per cell, and proteomic analysis identified a median of 2123 protein groups per cell across two mass spectrometry platforms. Compared with a standalone single-cell proteomics protocol, incorporating the mRNA extraction step reduced median proteomic depth by approximately 11% (median 1,965 vs. 2,204 protein groups per cell), while mean percell identification remained comparable across workflows (1,790 vs. 1,775 protein groups per cell). Direct comparison of paired transcript and protein abundance yielded a median Spearman correlation of {rho} {approx} 0.38; after correction for detection depth, the partial correlation was 0.067.

12
Systems-Informed prioritization of Exosomal Protein Candidates in TNBC Identifies an ECM Invasion Module and Nominates Agrin as a High-Priority Target

Nguyen, T. M.

2026-05-19 cancer biology 10.64898/2026.05.14.725271 medRxiv
Top 0.2%
12.4%
Show abstract

BackgroundTriple-negative breast cancer (TNBC) remains the most clinically challenging breast cancer subtype, in part due to the absence of validated molecular targets and the limited availability of non-invasive early detection strategies. Tumor-derived exosomes have emerged as promising liquid biopsy analytes, yet the functional organization of their protein cargo and the identification of biologically meaningful candidates remain incompletely characterized. MethodsWe present a Composite Driver Score (CDS) framework that integrates differential expression magnitude with protein-protein interaction network topology and Analytic Hierarchy Process (AHP)-based multi-criteria weighting to prioritize exosomal protein candidates in a systems-informed manner. The framework was applied to publicly available label-free quantitative proteomic datasets comparing MDA-MB-231 (TNBC) and MCF-10A (non-tumorigenic) exosomal fractions, with cross-dataset validation performed on an independent proteomic dataset. ResultsCDS prioritization demonstrated robustness to variations in proteome depth and parameter weighting, consistently recovering a functionally coherent set of extracellular matrix (ECM) and adhesion-associated proteins. Network and pathway analyses revealed coordinated co-enrichment of integrin receptors, cognate ECM ligands, and associated co-receptors -- consistent with selective packaging of a functionally integrated invasion module. Agrin (AGRN), a heparan sulfate proteoglycan with virtually limited prior characterization in TNBC exosome biology, emerged as a high-priority candidate through its network integration within this ECM program. ConclusionsThese findings support a model in which TNBC-derived exosomes carry coordinated molecular programs capable of modulating extracellular matrix architecture. The CDS framework offers a transferable strategy for integrative exosomal biomarker prioritization and a systems-level foundation for targeted liquid biopsy panel development.

13
Manchester Proteome Profiler: A User-Friendly Platform for Quantitative Proteomic Analysis

Cain, S. A.; Fatima, M.; Humphries, M.

2026-05-18 bioinformatics 10.64898/2026.05.14.725092 medRxiv
Top 0.2%
10.4%
Show abstract

Manchester Proteome Profiler (MPP) is an open-source R Shiny application that streamlines downstream analysis of quantitative proteomic data. Compatible with grouped protein intensities tables from MaxQuant, FragPipe, Proteome Discoverer and other custom layouts, MPP provides an integrated platform for filtering, normalisation, imputation, differential expression analysis and cluster analysis across user-chosen experimental conditions. MPP supports both single- and dual-dataset comparisons, incorporates SAINTexpress for affinity purification and proximity labelling experiments, and downstream analysis of the significant protein list clusters to functional enrichment and interaction networks via Gene Ontology, BioGRID and STRING. Benchmarking with a KRAS proximity biotinylation dataset demonstrated the ability of MPP to identify reproducible clusters of differentially expressed proteins and reveal biologically meaningful patterns, including enrichment of solute carrier transporters and adhesion molecules. With interactive visualisations, customisable reports, and support for complex experimental designs, MPP offers a novel, versatile and user-friendly environment for proteomic data exploration and hypothesis generation.

14
Development of a Xylene-Free Sample Preparation Protocol for Quantitative Proteomics of Clinically Relevant Formaldehyde-Fixed Paraffin-Embedded Needle Biopsy Samples

Moagi, M.; Beke, L.; Mehes, G.; Kecskemeti, G.; Szabo, Z.; Turiak, L.; Csosz, E.

2026-05-14 molecular biology 10.64898/2026.05.12.724492 medRxiv
Top 0.2%
10.2%
Show abstract

Fresh-frozen tissues are considered the gold standard for proteomic analyses due to superior preservation of protein integrity; however, their use is limited by the logistical and financial requirements of long-term storage. Formaldehyde-fixed paraffin-embedded (FFPE) tissues provide a practical alternative owing to their stability and widespread availability in clinical settings. A critical step in FFPE proteomics is deparaffinization, which traditionally relies on organic solvents such as xylene, along with efficient reversal of formaldehyde-induced crosslinks. In this study, we evaluated multiple FFPE protein extraction and digestion workflows including chaotropic, surfactant-based, and detergent-free approaches in combination with xylene-free deparaffinization strategies, using label-free data-independent acquisition (DIA) LC-MS/MS. Among the tested methods, a chaotropic-, reductant-, and surfactant-free in-solution digestion workflow demonstrated robust protein and peptide recovery. A modified version of this protocol further improved peptide coverage while maintaining comparable protein depth. The applicability of the optimized workflow was assessed using FFPE needle biopsy samples from control, hepatic steatosis, and liver fibrosis groups. Distinct proteomic patterns were observed across conditions, with hepatic steatosis associated with early activation of stress-response pathways, while fibrosis showed evidence suggesting altered lipid metabolism. Overall, this study presents a simple, xylene-free, and MS-compatible workflow for FFPE proteomics that is suitable for low-input clinical samples and may support broader application of archival tissues in proteomic research.

15
Predicting and Elucidating Peptide Retention Mechanisms with Graph Attention Networks

Kensert, A.; Hruzova, K.; Devreese, R.; Nameni, A.; Declercq, A.; Gabriels, R.; Martens, L.; Bouwmeester, R.; Urban, J.

2026-05-20 bioinformatics 10.64898/2026.05.18.725893 medRxiv
Top 0.2%
10.2%
Show abstract

Liquid chromatography (LC) is a key technology in bottom-up proteomics, separating proteolytic peptides to decrease sample complexity, enhance coverage, and increase the robustness of protein identification and quantification. Although high-resolution mass spectrometry has advanced significantly, comparable progress in LC has lagged, primarily due to a limited understanding of peptide-column interactions. To bridge this knowledge gap, we introduce a novel deep learning model (PeptideGNN) based on a Graph Neural Network (GNN) architecture to model and elucidate peptide behaviors across various separation conditions. Trained to accurately predict peptide retention times on ten diverse proteomic datasets, the model subsequently employed a saliency mapping technique to interpret the underlying retention mechanisms. Our model consistently outperformed existing retention-time predictors across multiple datasets, while the saliency mapping, importantly, revealed insights into peptide-stationary phase interactions, highlighting the effects of neighboring amino acids, post-translational modifications (PTMs), chromato-graphic columns, and mobile phase additives on peptide retention.

16
PEXMap: A proteogenomic method for exon and isoform level mapping of mass spectrometry derived peptides

Awasthi, D.; Verma, P.; Pandit, S. B.

2026-05-04 systems biology 10.64898/2026.04.29.721330 medRxiv
Top 0.3%
8.9%
Show abstract

Alternative splicing (AS) expands transcriptome and proteome diversity by differentially combining exons or their splice variants. Although RNA-seq studies have uncovered transcriptomic variability, understanding the corresponding protein-level diversity remains limited. Mass spectrometry-based proteomics provides protein-level insights through MS/MS peptide annotations, which are mostly linked to gene/transcript or UniProt identifiers. However, tracing them to specific isoforms remains challenging due to the lack of exon mapping or inconsistent annotations. We developed PEXMap (PeptideEXonMapper), a k-mer-based proteogenomic framework that systematically maps MS/MS peptides to genes, transcripts, exons, or exon-exon junctions by exact matching of unique 8-mers derived from MS/MS peptides to those in reference databases from exon-resolved isoforms. Comparing PEXMap mappings of human proteome from PeptideAtlas showed annotation concordance with it. Applying PEXMap to liver and pancreas proteomes, we identified tissue-specific isoform expression and, similarly, annotated the cancer proteome. PEXMap reliable mappings could provide insights into role of AS in shaping proteomes across tissues and disease states. Source code is publicly available for download at GitHub: https://github.com/deepanshicbg/PEXMap and supported on Linux.

17
GlyComboCLI enables command line-based FAIR workflows for glycan composition assignment in mass spectrometry data

Kelly, M. I.; Thang, W. C. M.; Pang, C. N. I.; Gustafsson, O. J. R.; Ashwood, C.

2026-05-14 bioinformatics 10.64898/2026.05.13.725058 medRxiv
Top 0.3%
8.9%
Show abstract

Glycans are integral biomolecules whose presence cannot be predicted from genomic data alone, necessitating experimental characterisation through approaches including mass spectrometry. Assignment of glycan compositions to observed mass to charge ratios is computationally challenging due to the potential monosaccharide diversity and existing tools lack the required flexibility for integration into automated bioinformatic workflows. Here, we present GlyComboCLI, an open-source command-line application for the assignment of glycan compositions to mass spectrometry data which expands upon our previous GUI application, GlyCombo. GlyComboCLI accepts mass lists and vendor-neutral mzML files, supports an extensive range of monosaccharides, derivatisation states, reducing-end modifications, and adducts to ensure compatibility with a breadth of glycomics approaches. Outputs are compatible with downstream tools including Skyline and GlycoWorkBench. This software is deployable as a standalone executable, a Docker container, and a Galaxy tool, adhering to FAIR principles. When applied to 52 raw files from a published mouse glycomics dataset, a local instance completed composition assignment and downstream quality control in under three hours, recovering biologically consistent findings. Furthermore, an integrated Galaxy workflow demonstrated reproducible detection of sialidase treatment effects. GlyComboCLI substantially reduces the pool of spectra requiring manual structural interpretation, offering a flexible and scalable solution for glycomics bioinformatic workflows.

18
Investigation of Protein Melting Temperature Prediction with Cross-Method Validation on Biophysical Data

Pailozian, K.; Kohout, P.; Damborsky, J.; Mazurenko, S.

2026-05-11 bioinformatics 10.64898/2026.05.07.723192 medRxiv
Top 0.3%
8.8%
Show abstract

MotivationProtein melting temperature (Tm) prediction accelerates the discovery of thermostable enzymes which are crucial for industrial biotechnology often requiring harsh reaction conditions. Experimental determination of Tm remains labour-intensive and varies across techniques, motivating the development of in silico predictors. Mass-spectrometry datasets such as Meltome Atlas now enable large-scale Tm prediction with models based on deep learning, but model generalisation across diverse experimental datasets has not been systematically tested. ResultsWe evaluated the generalisability of state-of-the-art deep learning approaches and explored ESM-based embeddings for Tm prediction. To this end, we assembled the ProMelt training dataset (45 441 proteins) and five independent biophysics-based validation datasets. Our analysis revealed substantial differences between proteomics- and biophysics-based Tm measurements, highlighting the challenge of cross-domain generalisation. Existing state-of-the-art predictors trained on large-scale proteomics datasets showed reduced performance on biophysics-based validation sets. Our fine-tuned embedding-based models, particularly LoRA-adapted ESM-2 (TmProt 1.0), outperformed state-of-the-art predictors in identifying thermostable proteins (Tm[≥] 60 {degrees}C) across heterogeneous datasets, achieving AUC scores of 0.75-0.77. We also demonstrated that the available models could be used efficiently in the sequence prioritization task. AvailabilityThe TmProt web server is available at https://loschmidt.chemi.muni.cz/tmprot/. Source code and data are available at https://github.com/loschmidt/TmProt.

19
Radiant DIA: A Fast, Sensitive, and Accurate Search Engine for Quantitative Proteomics

Just, S.; Cantrell, L. S.; Nichols, A.; Wang, J.; Kis, J.; Mohtashemi, I.; Platt, T.; Farokhzad, O.; Batzoglou, S.

2026-05-04 bioinformatics 10.64898/2026.04.29.721743 medRxiv
Top 0.3%
8.6%
Show abstract

In mass spectrometry-based proteomics, robust and efficient search engines are essential for accurate peptide and protein identification and quantification. Advances in sample preparation and instrumentation have increased the demand for highly scalable processing tools, with datasets comprising hundreds or thousands of samples in single-cell and population studies. Here we present Radiant DIA, a novel Data-Independent Acquisition search engine which achieves 4x faster processing and 10x lower cloud compute costs for large experiments while ensuring rigorous control of false discovery rate (FDR) and maintaining similar sensitivity, precision, and quantitative accuracy. The Radiant DIA search engine is paired with a modular pipeline deployable on cloud and desktop environments comprising individual modules for distributed re-scoring, FDR estimation, protein inference and quantification. Unlike traditional monolithic applications, this architecture enables high-performance, cloud-scale analysis without sacrificing local usability. Together, the Radiant DIA and Fulcrum Pipeline tools enhance computational efficiency to facilitate biological discovery in large-scale proteomics, as demonstrated by analyses of real-world experiments up to thousands of MS acquisitions.

20
NativeReady: an open benchmark and sequence-based triage model for native mass spectrometry suitability

Znabu, B. F.; Atif, Z.

2026-05-06 bioinformatics 10.64898/2026.05.03.722506 medRxiv
Top 0.3%
8.3%
Show abstract

Native mass spectrometry is a central analytical method for characterizing intact proteins, antibody-drug conjugates, and non-covalent assemblies, and it is increasingly the deciding measurement in biotherapeutic development pipelines. A single screening attempt requires days of expression, purification, and buffer exchange into ammonium acetate, followed by 30 to 60 minutes of optimization on a Q-Exactive UHMR or comparable instrument. To our knowledge, no published sequence-based predictor currently estimates native MS suitability before experimental screening. We curated 634 unique proteins with documented native MS outcomes, drawn from a 232-protein hand-curated base set, 358 entries recovered from RCSB PDB by full-text searching for native MS terminology, and 44 evidence-based extractions from supplementary tables across 80 EuropePMC papers. We trained four model variants on this benchmark: a 36-feature BioPython physicochemical baseline, an ESM-2 linear probe, an ESM-2 PCA-256 random forest, and a combined model that concatenates ESM-2 PCA components with BioPython features. All variants were evaluated under cluster-aware 5-fold cross-validation (GroupKFold over ESM-2 embedding-similarity clusters) with isotonic calibration, and standard stratified 5-fold cross-validation is reported as a sensitivity analysis. Under cluster-aware 5-fold cross-validation (GroupKFold over ESM-2 embedding-similarity clusters, our defense against homology leakage), the combined model achieved an AUC of 0.869 plus or minus 0.036, robust against the original stratified-CV value (0.873) and the BioPython baseline (0.852). The ESM-2-only variants showed AUC drops of 0.024 to 0.046 between stratified and cluster-aware splits, indicating that some of the apparent ESM-2 contribution under standard CV reflects homology leakage. Negative recall was 9.4 percent under cluster-aware splitting versus 26.0 percent under stratified, confirming that the models apparent failure-detection capability was substantially inflated by within-fold homology. We report both numbers and treat the cluster-aware values as the primary results. We release the curated dataset, the trained model, and an interactive web tool at nativeready.netlify.app. In its current form, NativeReady should be interpreted primarily as a positive-suitability triage tool; failure prediction remains limited by the scarcity of experimentally documented negative cases. We propose a user-contribution mechanism to accumulate real failure data over time. To our knowledge, no published sequence-based predictor currently estimates native MS suitability before experimental screening, and NativeReady is the first open benchmark and triage model specifically designed for this task.